class: center, middle, inverse, title-slide # APSTA-GE 2003: Intermediate Quantitative Methods ## Lab Section 003, Week 7 ### New York University ### 10/20/2020 --- ## Reminders - **Group Assignment** - Solutions available [HERE](https://drive.google.com/drive/folders/17GnapKdf_PzsHo1s0vMCx_gFZIq7v5qE?usp=sharing) - Comments on group submission available soon - **Assignment 4** - Due: **10/28/2020 11:55pm (EST)** - Office hours - Monday 9 - 10am (EST) - Wednesday 12:30 - 1:30pm (EST) - Additional time slots available - Sign-up sheet [HERE](https://docs.google.com/spreadsheets/d/1YY38yj8uCNIm1E7jaI9TJC494Pye2-Blq9eSK_eh6tI/edit?usp=sharing) - Office hour Zoom link - https://nyu.zoom.us/j/97347070628 (pin: 2003) - Office hour notes - Available on NYU Classes - Updates on the shiny app soon --- ## Today's Topics - Review on Regression Analysis - Influential vs. Leverage Points - (Slightly) Review on Multiple Regression - Regression Simulation - Using Simulator on Shiny App --- class: inverse, center, middle # Review Regression Analysis --- ## Influential vs. Leverage Points - How to measure distance? - **Outlier**: distance between the point and the **regression line** - **Leverage**: - the x value of the point is far from the data - distance between the point and the regression line is not far - **Influential**: - the x value of the point is far from the data - distance between the point and the regression line is far --- ## Visualize Leverage Points - **Linear equation**: `\(Y_i = \beta_1 + \beta_2 \cdot X_i\)` Add a leverage point.
--- ## Check Effects on Leverage Points
--- ## Visualize Influential Points - **Linear equation**: `\(Y_i = \beta_1 + \beta_2 \cdot X_i\)` Add an influential point.
--- ## Check Effects on Influential Points
--- ## Compare Visualization on Leverage and Influential <!-- --> --- ## Detecting Influential Points Locate outliers by their index numbers ```r library(car) *plot(mod_lin, which = 1) ``` <!-- --> --- ## How to Remove Outliers from Data? ```r DT::datatable(dat_full, options = list(pageLength = 5)) ```
```r summary(dat_lev) ``` ``` ## X Y ## Min. : 0.06915 Min. :20.23 ## 1st Qu.: 2.38651 1st Qu.:27.82 ## Median : 3.29559 Median :29.93 ## Mean : 3.33247 Mean :30.05 ## 3rd Qu.: 4.14628 3rd Qu.:32.40 ## Max. :10.00000 Max. :38.35 ``` --- ## How to Remove Outliers from Data? (Continued) ```r # Option 1: dat_clean <- dat_full[-c(51, 52), ] # Option 2: dat_clean_2 <- dat_full[c(1:50), ] table(dat_clean == dat_clean_2) ``` ``` ## ## TRUE ## 100 ``` --- class: inverse, center, middle # Review on Multiple Regression --- ## Multiple Regression - Regression with two or more independent variables - To interpret, control one or more independent variables - Equation: `$$Y = \beta_0 + \beta_1 \cdot X_1 + \beta_2 \cdot X_2 + ... + \beta_i \cdot X_i + \varepsilon$$` - Dummy variables: - Regression cannot accept categorical variables directly - Convert to numeric variables --- ## Demo: Simple Regression Model ```r dat_demo <- datarium::marketing # Fit a simple regression model *summary(mod_sing <- lm(sales ~ youtube, data = dat_demo)) ``` ``` ## ## Call: ## lm(formula = sales ~ youtube, data = dat_demo) ## ## Residuals: ## Min 1Q Median 3Q Max ## -10.0632 -2.3454 -0.2295 2.4805 8.6548 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 8.439112 0.549412 15.36 <2e-16 *** ## youtube 0.047537 0.002691 17.67 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.91 on 198 degrees of freedom ## Multiple R-squared: 0.6119, Adjusted R-squared: 0.6099 ## F-statistic: 312.1 on 1 and 198 DF, p-value: < 2.2e-16 ``` --- ## Demo: Multiple Regression Model ```r # Fit a multiple regression model *summary(mod_multi <- lm(sales ~ youtube + facebook + newspaper, data = dat_demo)) ``` ``` ## ## Call: ## lm(formula = sales ~ youtube + facebook + newspaper, data = dat_demo) ## ## Residuals: ## Min 1Q Median 3Q Max ## -10.5932 -1.0690 0.2902 1.4272 3.3951 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.526667 0.374290 9.422 <2e-16 *** ## youtube 0.045765 0.001395 32.809 <2e-16 *** ## facebook 0.188530 0.008611 21.893 <2e-16 *** ## newspaper -0.001037 0.005871 -0.177 0.86 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.023 on 196 degrees of freedom ## Multiple R-squared: 0.8972, Adjusted R-squared: 0.8956 ## F-statistic: 570.3 on 3 and 196 DF, p-value: < 2.2e-16 ``` --- ## Demo: Comparison ```r mod_sing$coefficients ``` ``` ## (Intercept) youtube ## 8.43911226 0.04753664 ``` ```r mod_multi$coefficients ``` ``` ## (Intercept) youtube facebook newspaper ## 3.526667243 0.045764645 0.188530017 -0.001037493 ``` --- class: inverse, center, middle # Regression Simulation --- ## Regression Simulation **Let's move to the Shiny App.** --- ## Contact Tong Jin - Email: tj1061@nyu.edu - Office Hours - Mondays, 9 - 10am (EST) - Wednesdays, 12:30 - 1:30pm (EST)